Maximum Likelihood and Covariant Algorithms for Independent Component Analysis

نویسنده

  • David J.C. MacKay
چکیده

Bell and Sejnowski (1995) have derived a blind signal processing algorithm for a non-linear feedforward network from an information maximization viewpoint. This paper first shows that the same algorithm can be viewed as a maximum likelihood algorithm for the optimization of a linear generative model. Second, a covariant version of the algorithm is derived. This algorithm is simpler and somewhat more biologically plausible, involving no matrix inversions; and it converges in a smaller number of iterations. Third, this paper gives a partial proof of the ‘folk-theorem’ that any mixture of sources with high-kurtosis histograms is separable by the classic ICA algorithm. Fourth, a collection of formulae are given that may be useful for the adaptation of the non-linearity in the ICA algorithm. 1 Blind separation Algorithms for blind separation (Jutten and Herault 1991; Comon et al. 1991; Bell and Sejnowski 1995; Hendin et al. 1994) attempt to recover source signals s from observations x which are linear mixtures (with unknown coefficients V) of the source signals x = Vs. (1) The algorithms attempt to create the inverse of V (within a post-multiplicative factor) given only a set of examples {x}. Bell and Sejnowski (1995) have derived a blind separation algorithm from an information maximization viewpoint. The algorithm may be summarised as a linear mapping: a = Wx (2) followed by a non-linear map: zi = φi(ai), (3) where, for example, φ = − tanh(ai), with a learning rule: ∆W ∝ [WT] + zxT. (4) 1 Another non-linear function of ai, yi = g(ai), is also mentioned by Bell and Sejnowski, but it will not be needed here. This paper has four parts. First it is shown that Bell and Sejnowski’s (1995) algorithm may be derived as a maximum likelihood algorithm. This has been independently pointed out by Pearlmutter and Parra (1996) who also give an exciting generalization of the ICA algorithm. Second, it is pointed out that the algorithm (4) is not covariant , and a covariant algorithm is described which is simpler, faster, and somewhat more biologically plausible. This covariant algorithm has been independently suggested by Amari et al. (1996) and is used by Pearlmutter and Parra (1996). Third, this paper gives a partial proof of the ‘folk-theorem’ that any mixture of sources with high-kurtosis histograms is separable by the classic ICA algorithm. Fourth, a collection of formulae are given that may be useful for the adaptation of the nonlinearity in the ICA algorithm. 2 Maximum likelihood derivation of ICA 2.1 Latent variable models Many statistical models are generative models that make use of latent variables to describe a probability distribution over observables (Everitt 1984). Examples of latent variable models include mixture models, which model the observables as coming from a superposed mixture of simple probability distributions (Hanson et al. 1991) (the latent variables are the unknown class labels of the examples); hidden Markov models (Rabiner and Juang 1986); factor analysis; Helmholtz machines (Hinton et al. 1995; Dayan et al. 1995); and density networks (MacKay 1995; MacKay 1996). Note that it is usual for the latent variables to have a simple distribution, often a separable distribution. Thus when we learn a latent variable model, we are finding a description of the data in terms of independent components. One thus might expect that an ‘independent component analysis’ algorithm should have a description in terms of a generative latent variable model. And this is indeed the case. Independent component analysis is latent variable modelling. 2.2 The generative model Let us model the observable vector x = {xj}j=1 as being generated from latent variables s = {si}i=1 via a linear mapping V. The simplest derivation results if we assume I = J , i.e., the number of sources is equal to the number of observations. The data we obtain are a set of N observations D = {x}n=1. We assume that the latent variables are independently distributed, with marginal distributions P (si|H) ≡ pi(si). Here H denotes the assumed form of this model and the assumed probability distributions pi of the latent variables. The probability of the observables and the hidden variables, given V and H, is: P ({x}n=1, {s}n=1|V,H) = N ∏

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparing pixel-based and object-based algorithms for classifying land use of arid basins (Case study: Mokhtaran Basin, Iran)

In this research, two techniques of pixel-based and object-based image analysis were investigated and compared for providing land use map in arid basin of Mokhtaran, Birjand. Using Landsat satellite imagery in 2015, the classification of land use was performed with three object-based algorithms of supervised fuzzy-maximum likelihood, maximum likelihood, and K-nearest neighbor. Nine combinations...

متن کامل

A comparison of algorithms for maximum likelihood estimation of Spatial GLM models

In spatial generalized linear mixed models, spatial correlation is assumed by adding normal latent variables to the model. In these models because of the non-Gaussian spatial response and the presence of latent variables the likelihood function cannot usually be given in a closed form, thus the maximum likelihood approach is very challenging. The main purpose of this paper is to introduce two n...

متن کامل

Improving the Performance of Bayesian Estimation Methods in Estimations of Shift Point and Comparison with MLE Approach

A Bayesian analysis is used to detect a change-point in a sequence of independent random variables from exponential distributions. In This paper, we try to estimate change point which occurs in any sequence of independent exponential observations. The Bayes estimators are derived for change point, the rate of exponential distribution before shift and the rate of exponential distribution after s...

متن کامل

Algorithms for Independent Components Analysis and Higher Order Statistics

A latent variable generative model with finite noise is used to describe several different algorithms for Independent Components Analysis (ICA). In particular, the Fixed Point ICA algorithm is shown to be equivalent to the ExpectationMaximization algorithm for maximum likelihood under certain constraints, allowing the conditions for global convergence to be elucidated. The algorithms can also b...

متن کامل

A geometric algorithm for overcomplete linear ICA

Geometric algorithms for linear quadratic independent component analysis (ICA) have recently received some attention due to their pictorial description and their relative ease of implementation. The geometric approach to ICA has been proposed first by Puntonet and Prieto [1] [2] in order to separate linear mixtures. We generalize these algorithms to overcomplete cases with more sources than sen...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996